Search CORE

8 research outputs found

Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition

Author: Kirchhoff Katrin
Ling Shaoshi
Liu Yuzong
Salazar Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/04/2020
Field of study

We propose a novel approach to semi-supervised automatic speech recognition (ASR). We first exploit a large amount of unlabeled audio data via representation learning, where we reconstruct a temporal slice of filterbank features from past and future context frames. The resulting deep contextualized acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end ASR system using a smaller amount of labeled audio data. In our experiments, we show that systems trained on DeCoAR consistently outperform ones trained on conventional filterbank features, giving 42% and 19% relative improvement over the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our approach can drastically reduce the amount of labeled data required; unsupervised training on LibriSpeech then supervision with 100 hours of labeled data achieves performance on par with training on all 960 hours directly. Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral

arXiv.org e-Print Archive

Crossref

On lexical level matching

Author: Ling Shaoshi
Publication venue
Publication date: 01/08/2017
Field of study

In many natural language understanding applications, text processing requires comparing lexical units: words, phrases, name entities and sentences. A significant amount of research has taken place in studying evaluating similarity metrics between those units. In this thesis, we summarize some research work in computing lexical similarity. We describe a new approach to compute similarity between two spans of text, using multiple semantic-units level comparison measures to compute sentence-level similarity scores

Illinois Digital Environment for Access to Learning and Scholarship Repository

Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition

Author: Kirchhoff Katrin
Ling Shaoshi
Salazar Julian
Publication venue
Publication date: 30/06/2019
Field of study

Pretrained contextual word representations in NLP have greatly improved performance on various downstream tasks. For speech, we propose contextual frame representations that capture phonetic information at the acoustic frame level and can be used for utterance-level language, speaker, and speech recognition. These representations come from the frame-wise intermediate representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken utterances. We first train the model on the Fisher English corpus with context-independent phoneme labels, then use its representations at inference time as features for task-specific models on the NIST LRE07 closed-set language recognition task and a Fisher speaker recognition task, giving significant improvements over the state-of-the-art on both (e.g., language EER of 4.68% on 3sec utterances, 23% relative reduction in speaker EER). Results remain competitive when using a novel dilated convolutional model for language recognition, or when ASR pretraining is done with character labels only.Comment: submitted to INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Author: Gong Yifan
Ling Shaoshi
Ye Guoli
Zhao Rui
Publication venue
Publication date: 13/09/2023
Field of study

Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effectively, quickly and inexpensively adapting text has become a primary concern for deploying AED systems in industry. To address this issue, we propose a novel model, the hybrid attention-based encoder-decoder (HAED) speech recognition model that preserves the modularity of conventional hybrid automatic speech recognition systems. Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques. We demonstrate that the proposed HAED model yields 21\% Word Error Rate (WER) improvements in relative when out-of-domain text data is used for language model adaptation, and with only a minor degradation in WER on a general test set compared with conventional AED model

arXiv.org e-Print Archive

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Author: Gong Yifan
Hu Yuxuan
Lin Ed
Ling Shaoshi
Qian Shuangbei
Qian Yao
Ye Guoli
Zeng Michael
Publication venue
Publication date: 02/08/2023
Field of study

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. However, integrating a pretrained language model into an E2E speech recognition model has shown limited benefits due to the mismatches between text-based LLMs and those used in E2E ASR. In this paper, we explore an alternative approach by adapting a pretrained LLMs to speech. Our experiments on fully-formatted E2E ASR transcription tasks across various domains demonstrate that our approach can effectively leverage the strengths of pretrained LLMs to produce more readable ASR transcriptions. Our model, which is based on the pretrained large language models with either an encoder-decoder or decoder-only structure, surpasses strong ASR models such as Whisper, in terms of recognition error rate, considering formats like punctuation and capitalization as well

arXiv.org e-Print Archive

Advanced clinical education for stroke physicians in China: The ACTION and SCA models

Author: Bernard Yan
Huazheng Liang
Ling Y
Liping Liu
Qiang Dong
Shaoshi Wang
Stephen Davis
Wen W
Yongjun Wang
Zhan C
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Induction chemotherapy‐based organ‐preservation protocol improve the function preservation compared with immediate total laryngectomy for locally advanced hypopharyngeal cancer—Results of a matched‐pair analysis

Author: Haiyang Li
Hongzhi Ma
Jiaming Chen
Jiaqi Xu
Jugao Fang
Junmao Gao
Ling Feng
Lingwa Wang
Lizhen Hou
Meng Lian
Qi Zhong
Qian Shi
Ru Wang
Shaoshi Chen
Shizhi He
Shuling Li
Shurong Zhang
Xixi Shen
Yang Zhang
Yanming Zhao
Yifan Yang
Zhigang Huang
Publication venue: Wiley
Publication date: 01/08/2023
Field of study

Abstract Background We performed a paired analysis to compare the therapeutic effect between the induction chemotherapy‐based organ‐preservation approach and immediate total laryngectomy in hypopharyngeal squamous cell carcinoma patients requiring total laryngectomy. Methods 351 patients who were treated with organ‐preservation approach were compared with 110 patients who were treated with total laryngectomy. The main measures and outcomes were progression‐free survival (PFS), overall survival (OS), and larynx function preservation survival (LFPS). Results No statistical difference was observed for 3‐, 5‐, and 10‐year PFS and OS in two groups. In the organ‐preservation group, the 3‐, 5‐, and 10‐year LFPS was 30.7%, 23.3%, and 16.6%, respectively. The LFPS of Stage III > Stage IV, N0 > N1 > N2 > N3, T2 > T3 > T4, CR > PR > SD > PD patients (all p values <0.05). Conclusions Survival outcomes did not significantly differ between the two groups. The organ‐preservation approach allowed more than 70% of the survivors to retain their larynx function

Directory of Open Access Journals

LGR5 marks targetable tumor-initiating cells in mouse liver cancer

Author: Bolkestein Michiel
Bruno Marco
Cao Wanlu
Chen K. (Kan)
Doukas Michael
J. W. van der Laan L. (Luc)
Kwekkeboom Jaap
Li Meng
Li S. (Shan)
Liu Jiaye
Ma Buyun
Ma Z. (Zhongren)
Noordam L. (Lisanne)
Pan Qiuwei
Peppelenbosch Maikel
Smits Ron
Sprengers Dave
Verstegen Monique
Wang L. (Ling)
Wang Wenshi
Zhang Shaoshi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/04/2020
Field of study

Cancer stem cells (CSCs) or tumor-initiating cells (TICs) are thought to be the main drivers for disease progression and treatment resistance across various cancer types. Identifying and targeting these rare cancer cells, however, remains challenging with respect to therapeutic benefit. Here, we report the enrichment of LGR5 expressing cells, a well-recognized stem cell marker, in mouse liver tumors, and the upregulation of LGR5 expression in human hepatocellular carcinoma. Isolated LGR5 expressing cells from mouse liver tumors are superior in initiating organoids and forming tumors upon engraftment, featuring candidate TICs. These cells are resistant to conventional treatment including sorafenib and 5-FU. Importantly, LGR5 lineage ablation significantly inhibits organoid initiation and tumor growth. The combination of LGR5 ablation with 5-FU, but not sorafenib, further augments the therapeutic efficacy in vivo. Thus, we have identified the LGR5+ compartment as an important TIC population, representing a viable therapeutic target for combating liver cancer

Erasmus University Digital Repository